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1 Some Probability Theory 



1.1 Constrained distributions 

A random experiment has n possible results at each trial; so in trials there 
are conceivable outcomes. (We use the word "result" for a single trial, while 
"outcome" refers to the experiment as a whole; thus one outcome consists of an 
enumeration of N results, including their order. For instance, ten tosses of a die 
(n = 6, iV = 10) might have the outcome "1326642335.") Each outcome yields a 
set of sample numbers {Ni} and relative frequencies {/j = Ni/N,i = 1 . . .n}. In 
many situations the outcome of a random experiment is not known completely: 
One does not know the order in which the individual results occurred, and often 
one does not even know all n relative frequencies {/,} but only a smaller number 
m {m < n) of linearly independent constraints 

n 

'^ifi = ga , a = l...m. (1) 

i=l 

As a simple example consider a loaded die. Observations on this badly bal- 
anced die have shown that 6 occurs twice as often as 1; nothing peculiar was 
observed for the other faces. Given this information only and nothing else, i.e., 
not making use of any additional information that we might get from inspection 
of the die or from past experience with dice in general, all we know is a single 
constraint of the form (JlD with 

r 2: 1 = 1 

G\ = < 0: i = 2...5 (2) 
[ -1 : i = 6 

and gi = 0. 

The available data -in the form of linear constraints- are generally not suffi- 
cient to reconstruct unambiguously the relative frequencies {/«}. These frequen- 
cies may be regarded as Cartesian coordinates of a point in an ra-dimensional 
vector space. The m linear constraints, together with /j G [0, 1] and the normal- 
ization condition fi = 1, then just restrict the allowed points to some portion 
of an (n — m — l)-dimensional hyperplane. 

1.2 Concentration theorem 

Given an a priori probability distribution {pi} for the results i = 1 . . .n, the 
probability that A^ trials will yield the -generally different- relative frequencies 
{/.} is 

prob({/.}|fa},Ar) = ^,^'^, pf ...p^ . (3) 
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Here the second factor is the probabihty for one specific outcome with sample 
numbers {Nj}, and the first factor counts the number of all outcomes that give 
rise to the same set of sample numbers. With the definition 

W):=-E/^ln^ (4) 
i Pi 

and the shorthand notations / = {fi}, p = {pi} we can also write 

prob(/|p, TV) = prob(/|/, TV) exp[7V/^(/)] . (5) 

In particular, for two different data sets {fi} and {//} the ratio of their respective 
probabilities is given by 

P^^Hf\P,N) prob(/|/,iV) 

prob(/'b,iV) prob(/'|/',iV)"^P^'^^'-^^^ '^^^ 
where, by virtue of Stirling's formula 

x\ ^ V2nxx'^e~'^ , (7) 



it is asymptotically 



prob(/|/,iV) 



prob(f|f,iV) 



IIt ■ (8) 



As the latter ratio is independent of A^, for large and nearby distributions 
/' fa / the variation of prob(/|p, A^)/prob(/'|p, A^) is completely dominated by 
the exponential: 

^««exp[.,W)-W'))l . (.) 

Hence the probability with which any given frequency distribution / is realized is 
essentially determined by the quantity Ip{f)- The larger this quantity, the more 
likely the frequency distribution is realized. 

Consider now all frequency distributions allowed by m linearly independent 
constraints. As we discussed earlier, the allowed distributions can be visualized 
as points in some portion of an {n — m — l)-dimensional hyperplane. In this 
hyperplane portion there is a unique point at which the quantity Ip{f) attains a 
maximum J™*^^; we call this point the "maximal point" y™*^^. (That the maximal 
point is indeed unique can be seen as follows: Suppose there were not one but 
two maximal points corresponding to frequency distributions /*^^^ and f^'^\ Then 
the mixture / = (/^^^ + /^^^)/2 would have Ip{f) > I^'^^, which would be a 
contradiction.) It is possible to define new coordinates {xi . . .Xn-m-i} in the 
hyperplane such that 
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they are linear functions of the {/«}; 

the origin (x = 0) is at the maximal point; and 

in the vicinity of the maximal point 

Ip{x) = /p"'"" - ar^ + 0(r3) , a > , (10) 

where 



\ 



n—m—l 



E • (11) 



Frequency distributions that satisfy the given constraints (|ip and whose Ip{x) 
differs from J™^^ by more than M thus lie outside a hypersphere around the 
maximal point, the sphere's radius R being given by aB? = A/. The probability 
that trials will yield such a frequency distribution outside the hypersphere is 

prob(/, < (/— - A/)|m constraints) = jf ,.n-m-2 J^\_j^j2^ ■ (12) 

Here the factors r""'""^ in the integrand are due to the volume element, while 
the exponentials exp(— A^ar^) = exp(A^(/p(a;) — I^^^)) stem from the ratio @. 
Substituting t = Nar"^, defining 

s :=(n-m -3)/2 (13) 

and using 

T(s + 1) = / dtt'exp{-t) (14) 

JO 

one may also write 

1 

prob(/p < (7™''^ - A7)|m constraints) = — / dtt'exp(-t) ; (15) 

^ T{s + 1)Jnai 

which for large (A^ ^ -^/AJ) can be approximated by 

prob(/p < (/^^^ - Z^J) I m constraints) ^ — ^ — -{NAiy exp{-NAI) . (16) 

r(s + 1) 

As the number A^ of trials increases, this probability rapidly tends to zero for 
any finite A/. As A^ ^ oo, therefore, it becomes virtually certain that the (aside 
from m constraints) unknown frequency distribution has an Ip very close to J™*^^. 
Hence not only does the maximal point represent the frequency distribution that 
is the most likely to be realized (cf. Eq. (|)); but in addition, as A^ increases, 
all other -theoretically allowed- frequency distributions become more and more 
concentrated near this maximal point. Any frequency distribution other than the 
maximal point becomes highly atypical of those allowed by the constraints. 
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1.3 Frequency estimation 

We have seen that the knowledge of m {m < n) "averages" (|l]) constrains, but 
fails to specify uniquely, the relative frequencies {/i}. In view of this incomplete 
information the relative frequencies must be estimated. Our previous consid- 
erations suggest that the most reasonable estimate is the maximal point: that 
distribution which, while satisfying all the constraints, maximizes Ip{f)- This 
leads to a variational equation 
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. i Pi i a i 



(17) 



where the constraints, as well as the normalization condition X]j /« = I5 have been 
implemented by means of Lagrange multipliers. Its solution is of the form 

= ^ exp flnp, - (Inp), - ^'^^'^1 (18) 



with 



The term 



Z = 5: exp - (Inp), - ^ A'^Glj 



(19) 



(Inp)p := ^Pjlnpj (20) 



has been introduced by convention; it cancels from the ratio in (|18D and so does 
not affect the frequency estimate. The expression in the exponent simplifies if 
and only if the a priori distribution {pi} is uniform: In this case, 

Inp, - (Inp)p = . (21) 

The m Lagrange parameters {A*^} must be adjusted such as to yield the correct 
prescribed averages {Qo}- They can be determined from 

^ \nZ=-g, , (22) 



a set of m simultaneous equations for m unknowns. Finally, inserting ([T8|) into 
the definition oi Ip[f) gives 

/,"^^^=(lnp)p + lnZ + 5:AV . (23) 

a 

There remains the task of specifying the -possibly nonuniform- a priori prob- 
ability distribution {pi}. The {pi} are those probabilities one would assign before 
having asserted the existence of the constraints (|lD; i.e., being still in a state of 
ignorance. This "ignorance distribution" can usually be determined on the basis 
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of symmetry considerations: If the problem at hand is a priori invariant under 
some characteristic group then the {pi}, too, must exhibit this same group in- 
variance.Q For example, if a priori we do not know anything about the properties 
of a given die then our prior ignorance extends to all faces equally. The prob- 
lem is therefore invariant under a relabelling of the faces, which trivially implies 
{pi = 1/6}. In more complicated random experiments, especially those involving 
continuous and hence coordinate-dependent distributions, the task of specifying 
the a priori distribution may be less straightforward.^] 

For illustration let us return to the example of the loaded die, characterized 
solely by the single constraint (Q) . What estimates should we make of the relative 
frequencies {fi} with which the different faces appeared? Taking the a priori 
probability distribution -assigned to the various faces before one has asserted the 
die's imperfection- to be uniform, {pi = 1/6}, the best estimate (|18D for the 
frequency distribution reads 



exp( 



y max 



Z-iexp(Ai) 



i = 1 

z = 2. . .5 

i = 6 



(24) 



with only a single Lagrange parameter and 

Z = exp(-2A^) +4 + exp(A^) . 



(25) 



The Lagrange parameter is readily determined from 

d 



InZ 



-91=0 



with solution 



A^ = (ln2)/3 
This in turn gives the numerical estimates 



(26) 
(27) 



0.107 
0.170 
0.214 



1 

2. ..5 

6 



(2^ 



with an associated 



rmax 
P 



ln(l/6) + lnZ = -0.019 



(29) 



"'^The rationale underlying this consistency requirement has historically been called the "Prin- 
ciple of Insufficient Reason" (J. Bernoulli, Ars Conjectandi, 1713). 

^see for example E. T. Jaynes, Prior probabilities, IEEE Trans. Systems Sci. Cyb. 4, 227 
(1968) 
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The above algorithm for estimating frequencies can be iterated. Suppose 
that beyond the m constraints (|l|) we learn of / additional, linearly independent 
constraints 

n 

Y.G\h = ga , a=(m + l)...(m + /). (30) 

1=1 

In order to make an improved estimate that takes these additional data into 
account we can either, (i) starting from the same a priori distribution p as before, 
apply the algorithm to the total set of (m + /) constraints; or (ii) iterate: use 
the previous estimate (p!8D, which was based on the first m constraints only, as a 
new a priori distribution J™^^ p' ^ and then repeat the algorithm just for the I 
additional constraints. Both procedures give the same improved estimate f^^^'. 
Associated with this improved estimate is 

1.4 Hypothesis testing 

Now we consider random experiments for which complete frequency data are 
available. Suppose that, based on some insight we have into the systematic 
influences affecting the experiment, we conjecture that the observed relative fre- 
quencies can be fully characterized by a set of constraints of the -by now familiar- 
form (|1|), and that hence the observed relative frequencies can be fitted with a 
maximal distribution ([T8|) . This maximal distribution contains m fit parameters 
{A"} (the Lagrange parameters) whose specific values depend on the averages 
{ga}, which in turn are extracted from the data. It represents our theoretical 
model or hypothesis. 

In general, the experimental frequencies / and the theoretical fit do 
not agree exactly. Must the hypothesis therefore be rejected, or is the deviation 
merely a statistical fluctuation? The answer is furnished by the concentration 
theorem: Let be the number of trials performed to establish the experimental 
distribution, let 

A/ = /-^^ - /,(/) (32) 

and s = (n — m — 3)/2. For large A^ (A^ ^ ■^/AJ) the probability that statistical 
fluctuations alone yield an /p-difference as large as AI is given by (|^); typically 
the hypothesis is rejected whenever this probability is below 5%,0 

prob(/p < (/^^^ - Zy^) I m constraints) < 5% . (33) 

Rejecting a hypothesis means that the chosen set of constraints was not complete, 
and hence that important systematic effects have been overlooked. These must be 
incorporated in the form of additional constraints. In this fashion one can proceed 
iteratively from simple to ever more sophisticated models until the deviation of 
the fit from the experimental data ceases to be statistically significant. 

•^The hypothesis test presented here is closely related to the better-known test. 
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i 




A, 


1 


0.16230 


-0.00437 


2 


0.17245 


+0.00578 


3 


0.14485 


-0.02182 


4 


0.14205 


-0.02464 


5 


0.18175 


+0.01508 


6 


0.19960 


+0.02993 



Table 1: Wolf's die data: frequency distribution / and its deviation A from the 
uniform distribution. 



1.5 Jaynes' analysis of Wolf's die data 

The above prescription for testing hypotheses and -if rejected- for iteratively 
improving them by enlarging the set of constraints has been lucidly illustrated 
by E. T. Jaynes in his analysis of Wolf's die data.^ Rudolph Wolf (1816-1893), a 
Swiss astronomer, had performed a number of random experiments, presumably 
to check the validity of statistical theory. In one of these experiments a die 
(actually two dice, but only one of them is of interest here) was tossed 20, 000 
times in a way that precluded any systematic favoring of any face over any other. 
The observed relative frequencies {/j} and their deviations {Aj = /j — Pi} from 
the a priori probabilities {pi = 1/6} are given in Table |I[ Associated with the 
observed distribution is 

/p(/) = -0.006769 . (34) 

Our "null hypothesis" HO is that the die is ideal and hence that there are no 
constraints needed to characterize any imperfection (m = 0); the deviation of the 
experimental from the uniform distribution, with associated 

is merely a statistical fluctuation. However, the probability that statistical fluc- 
tuations alone yield an /p-difference as large as 

A/™ = I^^^i^o) _ _ 0.006769 (36) 
is practically zero: Using Eq. (|T6D with = 20, 000 and s = 3/2 we find 

prob(/p < (/^^" - A^™)|0 constraints) ~ 10"^^ . (37) 

^E. T. Jaynes, Concentration of distributions at entropy maxima, in: E. T. Jaynes, Papers on 
Probability, Statistics and Statistical Mechanics, ed. by R. D. Rosenkrantz, Kluwer Academic, 
Dordrecht (1989). 
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Therefore, the null hypothesis is rejected: The die cannot be perfect. 

Our analysis need not stop here. Not knowing the mechanical details of the 
die we can still formulate and test hypotheses as to the nature of its imperfections. 
Jaynes argued that the two most likely imperfections are: 

• a shift of the center of gravity due to the mass of ivory excavated from the 
spots, which being proportional to the number of spots on any side, should 
make the "observable" 

G\^i- 3.5 (38) 
have a nonzero average gi ^ 0; and 

• errors in trying to machine a perfect cube, which will tend to make one 
dimension (the last side cut) slightly different from the other two. It is 
clear from the data that Wolf's die gave a lower frequency for the faces 
(3,4); and therefore that the (3-4) dimension was greater than the (1-6) or 
(2-5) ones. The effect of this is that the "observable" 

has a nonzero average g2 ^ 0. 

Our hypothesis H2 is that these are the only two imperfections present. More 
specifically, we conjecture that the observed relative frequencies are characterized 
by just two constraints (m = 2) imposed by the measured averages 

gi = 0.0983 and c/2 = 0.1393 ; (40) 

and that hence the observed relative frequencies can be fitted with a maximal 
distribution 

/r^^"'^ = |exp(-X:A"Gj^) (4^) 
In order to test our hypothesis we determine 



i=l \ a=l J 



(42) 



fix the Lagrange parameters by requiring 

d 



9A» 

and then calculate 



lnZ = -^„ (43) 



i"p"'"^^'''^ = ln(l/6)+lnZ+^A'^(7a ■ (44) 



a=l 
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With this algorithm Jaynes found 




-0.006534 



(45) 



and thus 



A/ 



H2 



I: 



rmax(H2) 
P 



/p(/) = 0.000235 



(46) 



The probabihty for such an Jp-difference to occur as a result of statistical fluctu- 
ations is (with now s = 1/2) 



much larger than the previous 10~^^ but still below the usual acceptance bound 
of 5%. The more sophisticated model H2 is therefore a major improvement over 
the null hypothesis HO and captures the principal features of Wolf's die; yet there 
are indications that an additional very tiny imperfection may have been present. 

Jaynes' analysis of Wolf's die data furnishes a useful paradigm for the exper- 
imental method in general. All modern experiments at particle colliders (CERN, 
Desy, Fermilab. . . ), for example, yield data in the form of frequency distributions 
over discrete "bins" in momentum space, for each of the various end products 
of the collision. The search for interesting signals in the data (new particles, 
new interactions, etc.) essentially proceeds in the same manner in which Jaynes 
revealed the imperfections of Wolf's die: by formulating physically motivated 
hypotheses and testing them against the data. Such a test is always statistical in 
nature. Conclusions (say, about the presence of a top quark, or about the pres- 
ence of a certain imperfection of Wolf's die) can never be drawn with absolute 
certainty but only at some -quantifiable- confidence level. 

1.6 Conclusion 

In all our considerations a crucial role has been played by the quantity 1^: The 
algorithm that yields the best estimate for an unknown frequency distribution is 
based on the maximization of Ip] and hypotheses can be tested with the help of 
Eq. ([T6|), i.e., by simply comparing the experimental and theoretical values of Ip. 
We shall soon encounter the quantity Ip again and see how it is related to one of 
the most fundamental concepts in statistical mechanics: the "entropy." 

2 Macroscopic Systems in Equilibrium 

2.1 Macrostate 

For complex systems with many degrees of freedom (like a gas, fluid or plasma) 
the exact microstate is usually not known. It is therefore impossible to assign to 
the system a unique point in phase space (classical) or a unique wave function 




(47) 
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(quantal), respectively. Instead one must resort to a statistical description: The 
system is described by a classical phase space distribution p{tt) or an incoherent 
mixture 

p=j:mi^\ (48) 

i 

of mutually orthogonal quantum microstates {\i)}, respectively. (Where the dis- 
tinction between classical and quantal does not matter we shall use the generic 
symbol p.) Probabilities must be real, non-negative, and normalized to one; 
which implies the respective properties 

p(7r)* = p(7r) , p(7r)>0 , J dn pin) = 1 (49) 

or 

pt = p ^ p>o , trp = l . (50) 

In this statistical description every observable A (real phase space function or 
Hermitian operator, respectively) is assigned an expectation value 

{A), = J d7Tp{n)A{n) (51) 

or 

{A), = tT{pA) , (52) 

respectively. 

Typically, not even the distribution p is a priori known. Rather, the state of 
a complex physical system is characterized by very few macroscopic data. These 
data may come in different forms: 

• as data given with certainty, such as the type of particles that make up the 
system, or the shape and volume of the box in which they are enclosed. 
These exact data we take into account through the definition of the phase 
space or Hilbert space in which we are working; 

• as prescribed expectation values 

{Ga)p = Qa , a= l...m (53) 

of some set {Ga} of selected macroscopic observables. Examples might be 
the average total energy, average angular momentum, or average magneti- 
zation. Such data, which are of a statistical nature, impose constraints of 
the type (^ on the distribution p; or 

• as additional control parameters on which the selected observables {Ga} 
may explicitly depend, such as an external electric or magnetic field. 
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According to our general considerations in Section |1.3| the best estimate for the 
thus characterized macrostate is a distribution of the form (0). In the classical 
case this implies 



with 



p(7r) = I exp (^In (T(7r) -{Ina)^-^. ^"^.(Tr) j 
Z = J dn exp (\na{7i) - (In a), - A°G^(7r)^ 



while for a quantum system 



and 



exp ^In a - (In a)^ - ^ A^'Ga^ 
na).-^A'^G'„) 

a / 



Z = ti exp In a — (1 



(54) 



(55) 



(56) 



(57) 



In both cases a denotes the a priori distribution. The auxiliary quantity Z is 
referred to as the partition function.^ 

The phase space integral or trace in the respective expressions for Z depend 
on the specific choice of the phase space or Hilbert space; hence they may depend 
on parameters like the volume or particle number. Furthermore, there may be an 
explicit dependence of the observables {Ga} or of the a priori distribution a on ad- 
ditional control parameters. Therefore, the partition function generally depends 
not just on the Lagrange multipliers {A"} but also on some other parameters 
{h!^}. In analogy with the relation ( p2|) one then defines new variables 



7b 



d 



InZ 



{51 



(In contrast to (p^) there is no minus sign.) The {ga}, {A"}, {/i^} and {7^} are 
called the thermodynamic variables of the system; together they specify the sys- 
tem's macrostate. The thermodynamic variables are not all independent: Rather, 
they are related by (0) and (|58D , that is, via partial derivatives of InZ. One 
says that and 7?,, or ga and A'^, are conjugate to each other. 

Some combinations of thermodynamic variables are of particular importance, 
which is why the associated distributions go by special names. If the observables 
that characterize the macrostate -in the form of sharp values given with certainty. 



^Readers already familiar with statistical mechanics might be disturbed by the appearance 
of a in the definitions of p and Z. Yet this is essential for a consistent formulation of the 



theory: see, for instance, our remarks at the end of Section 1.3 on the possibility of iterating 
the frequency estimation algorithm. In most practical applications a is uniform and hence 
Incr — (ln(T)(j = 0. Our definitions of p and Z then reduce to the conventional expressions. 
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or in the form of expectation values- are all constants of the motion then the 
system is said to be in equilibrium. Associated is an equilibrium distribution 
of the form (|5^ ) or ([56|), with all {Ga} being constants of the motion. Such 
an equilibrium distribution is itself constant in time, and so are all expectation 
values calculated from it.Q The set of constants of the motion always includes the 
Hamiltonian (Hamilton function or Hamilton operator, respectively) provided it 
is not explicitly time-dependent. If its value for a specific system, the internal 
energy, and the other macroscopic data are all given with certainty then the 
resulting equilibrium distribution is called microcanonical; if just the energy is 
given on average, while all other data are given with certainty, canonical; and if 
both energy and total particle number are given on average, while all other data 
are given with certainty, grand canonical. 

Strictly speaking, every description of the macrostate in terms of thermody- 
namic variables represents a hypothesis: namely, the hypothesis that the sets 
{Ga} and {h^} are actually complete. This is analogous to Jaynes' model for 
Wolf's die, which assumes that just two imperfections (associated with two ob- 
servables Gi, G2) suffice to characterize the experimental data. Such a hypothesis 
may well be rejected by experiment. If so, this does not mean that our rationale 
for constructing p -maximizing Jo- under given constraints- was wrong. Rather, 
it means that important macroscopic observables or control parameters (such as 
"hidden" constants of the motion, or further imperfections of Wolf's die) have 
been overlooked, and that the correct description of the macrostate requires ad- 
ditional thermodynamic variables. 

2.2 First law of thermodynamics 

Changing the values of the thermodynamic variables alters the distribution p and 
with it the associated 

C- = /o(p) = (lna). + lnZ + 5:A'^(7„ . (59) 

a 

By virtue of Eqs. (^2[) and ( |5^ ) its infinitesimal variation is given by 

dl^"" = d{lna)^ + Y.^"d9a + Y.^bdh' . (60) 

a b 

As the set of constants of the motion always contains the Hamiltonian its value for 
the given system, the internal energy U, and the associated conjugate parameter, 
which we denote by f3, play a particularly important role. Depending on whether 
the energy is given with certainty or on average, the pair {U, j3) corresponds to a 
pair (/i, 7) or {g,X). For all remaining variables one then defines new conjugate 
parameters 

r := A7/3 , :=7,//? (61) 

^Here we have assumed that there is no time-dependence of the a priori distribution a. 
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such that in terms of these new parameters the energy differential reads 

dU = P''d{ir''-{l^cr)„)-Y,tdga-J2mbdh'' . (62) 

a b 

A change in internal energy that is effected solely by a variation of the pa- 
rameters {ga} or {/i^} is defined as work 

SW := -J2i^dga~Y.^bdh^ ! (63) 

a b 

some commonly used pairs {g, I) and {h, m) of thermodynamic variables are listed 
in Table 0. If, on the other hand, these parameters are held fixed {dga = dh^ = 0) 
then the internal energy can still change through the addition or subtraction of 

heat 

5Q:=-^A;d(/r^-(lna).) . (64) 

Here we have introduced an arbitrary constant k. Provided we choose this con- 
stant to be the Boltzmann constant 

k = 1.381 X 10~2^J/K , (65) 

we can identify the temperature 

and the entropy 

^:=fc(/— -(Ina).) (67) 
to write 5Q in the more familiar form 

5Q = TdS . (68) 

The entropy is related to the other thermodynamic variables via Eq. (^), i.e.,[] 

S = k\nZ + kJ2^''9a ■ (69) 

a 

The relation 

dU = 6Q + 6W , (70) 

which reflects nothing but energy conservation, is known as the first law of ther- 
modynamics. 



^Even though the entropy, like the partition function, is related to measurable quantities 
it is essentially an auxiliary concept and does not itself constitute a physical observable: In 
quantum mechanics, for example, there is nothing like a Hermitian "entropy operator." 
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{h, m) 


names 




iV,p) 


volume, pressure 


(iV, -/i) 


{N, -f,) 


particle number, chemical potential 


(M, -B) 


{B,M) 


magnetic induction, magnetization 


(P, -E) 


iE,P) 


electric field, electric polarization 






momentum, velocity 


(L, -a;) 




angular momentum, angular velocity 



Table 2: Some commonly used pairs of thermodynamic variables. In cases where 
two pairs are given, e. g., (M, —B) and {B, M), the proper choice depends on the 
specific situation: For example, the pair (M, —B) is adequate if the magnetization 
M is a constant of the motion whose value is given on average; while the pair 
[B, M) should be used if there is an externally applied magnetic field B which 
plays the role of a control parameter. 



2.3 Example: Ideal quantum gas 

We consider a gas of non-interacting bosons or fermions. We suppose that the 
total particle number is not given with certainty (but possibly on average, as in 
the grand canonical ensemble) so the system must be described in Fock space. We 
further suppose that the observables {Ga} whose expectation values are furnished 
as macroscopic data are all of the single-particle form 

Ga^Y.G\N, , (71) 



where the {G\} are arbitrary (c-numbcr) coefficients and the {Ni] denote number 
operators pertaining to some orthonormal basis of single-particle states. 

Provided the a priori distribution a is uniform, the best estimate for the macro- 
state has the form 

p=iexp(^-^aW,j (72) 

with 

a^ = Y.\'^G\ . (73) 

a 

For example, in the grand canonical ensemble (energy and total particle number 
given on average) the parameters {a*} are functions of the single-particle energies 
{e*}, the inverse temperature j3 and the chemical potential 

a' = (3{e' - n) . (74) 
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The partition function 

\ i / configurations {A''i,A''2v} * 

factorizes, for we work in Fock space where we sum freely over each Nf. 

i Ni i 

The sum over Ni extends from to the maximum value allowed by particle 
statistics: oo for bosons, 1 for fermions. Consequently, each factor Zi reads 

Z, = (lTe-''T' ' (77) 



the upper sign pertaining to bosons and the lower sign to fermions. This gives 

lnZ = TEln(lTe-"') (78) 

i 

and hence the average occupation 

= (iV,), = - A In Z = (e"' t l)"' (79) 
of any single-particle state i. Using the inverse relation 

a' = \n{l±ni) -Inm (80) 
together with the specific realization of Eq. (|69|), 

S = klnZ + kJ2a'ni , (81) 

i 

we find for the entropy 

S = -kY,[n,\nniT{l±ni)Hl±ni)] . (82) 

i 

2.4 Thermodynamic potentials 

Like the partition function, thermodynamic potentials are auxiliary quantities 
used to facilitate calculations. One example is the (generalized) grand potential 

n{T,t,h^) := -^InZ , (83) 

P 

related to the internal energy U via 

n = U-TS + Y.tga . (84) 
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Its differential 

dn = -SdT + J2 Qadl" - J2 ^bdh'' (85) 

a b 

shows that S, Qa and rrib can be obtained from the grand potential by partial 
differentiation; e.g., 

where the subscript means that the partial derivative is to be taken at fixed l"", h!'. 
In addition to the grand potential there are many other thermodynamic poten- 
tials: Their definition and properties are best summarized in a Born diagram (Fig. 
|1|). In a given physical situation it is most convenient to work with that potential 
which depends on the variables being controlled or measured in the experiment. 
For example, if a chemical reaction takes place at constant temperature and pres- 
sure (controlled variables T, {rrib} = {p}), and the observables of interest are the 
particle numbers of the various reactants (measured variables {ga} = {Ni}) then 
the reaction is most conveniently described by the free enthalpy G{T, Ni,p). 

When a large system is physically divided into several subsystems then in 
these subsystems the thermodynamic variables generally take values that differ 
from those of the total system. In the special case of a homogeneous system all 
variables of interest can be classified either as extensive -varying proportionally 
to the volume of the respective subsystem- or intensive -remaining invariant 
under the subdivision of the system. Examples for the former are the volume 
itself, the internal energy or the number of particles; whereas amongst the latter 
are the pressure, the temperature or the chemical potential. In general, if a 
thermodynamic variable is extensive then its conjugate is intensive, and vice 
versa. If we assume that the temperature and the are intensive, while the 
{h^} and the grand potential are extensive, then 

fihom(T,r,r/i^) = r-nhom(T,r,/i^) V r>0 (87) 

and hence 

^^hom = -E^b^' • (88) 

b 

This implies the Gibbs-Duhem relation 

SdT - J2 gadl" - J2 ^^^^b = ■ (89) 

a b 

For an ideal gas in the grand canonical ensemble, for instance, we have the 
temperature T and the chemical potential = {—fi} intensive, whereas the 
volume {h^} = {V} and the grand potential Q are extensive; hence 

l],gas(T,/X,V) = -p(T,/i)V . (90) 
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Figure 1: Born diagram. Corners correspond to thermodynamic potentials: the 
grand potential fl, the free energy F, the internal energy U, the enthalpy H, the 
free enthalpy G, the potential S (which vanishes for a homogeneous system), as 
well as two rarely used potentials ^^^^ X^- Sides of the cube correspond to 
thermodynamic variables: T, S, g, I, h and m. Opposite sides are conjugate to 
each other, and associated with each conjugate pair is a dotted "basis vector." 
Each corner is a function of the adjacent sides; e.g., the enthalpy if is a function 
of {S, g, m}. Their conjugates {T, I, h} can be obtained from H by partial differ- 
entiation, the sign depending on whether the requested conjugate variable is at 
the head (— ) or tail (+) of a basis vector; e.g., T = +dH/dS. One can go from 
one corner to the next by moving parallel or antiparallel to a basis vector, thereby 
(i) changing variables such as to get the correct dependence of the new potential, 
and (ii) adding (if moving parallel) or subtracting (if moving antiparallel) the 
product of the conjugate variables that are associated with the basis vector. For 
instance, in order to obtain the free enthalpy G from the enthalpy H one (i) uses 
T = +dH/dS to solve for S(T,g,m), since the free enthalpy will be a function 
of {T, g, m} rather than {S, g, m}; and then (ii) subtracts the product TS to get 
G{T,g,'m) = H{S{T, g,m), g,m) — TS{T, g,m). This procedure is known as a 
Legendre transformation. Successive application allows one to calculate all ther- 
modynamic potentials from the grand potential Q, and hence, ultimately, from 
the partition function Z. 
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2.5 Correlations 

Arbitrary expectation values {A)p in the macrostate or (|56D , respectively, 
depend on the Lagrange multipliers {A"} as well as -possibly- on other parame- 
ters {h^}. If the Lagrange multipliers vary infinitesimally while the {h''} are held 
fixed, the expectation value {A)p changes according to 

d{A)p = -Y,{6Ga,A)pdX'' . (91) 

a 

Here (; )p is the canonical correlation function with respect to the state p: 

{A;B)p:= J dn p{n)A{nyB{n) (92) 

in the classical case or 

{A;B)p:= ^ duii\p^ A^ p^-" b\ (93) 

JO L J 

in the quantum case, respectively. The observable 5Ga is defined as 

5Ga:=Ga-{Ga)p • (94) 

The correlation matrix 

:= (SG^: SG,), = - (Ifi) ^ ^ = 1„ z) (95) 

thus relates infinitesimal variations of A and g: 

dg, = -Y.d\-Gab , dX'^ = -Y^dg.iG-^f'^ . (96) 

a b 

The subscripts A, h of the partial derivatives indicate that they must be taken 
with all other {A'*} and all {h^} held fixed. Returning to our example of the ideal 
quantum gas, we immediately obtain from ([79|) the correlation of occupation 
numbers 

OTi ■ 

{6Nf,6N^)p = -^ = 6,jn,{l±n,) . (97) 

3 Linear Response 

3.1 Liouvillian and Evolution 

The dynamics of an expectation value {A)p is governed by the equation of motion 
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Here we have allowed for an explicit time-dependence of the observable A. Clas- 
sically, the Liouvillian L takes the Poisson bracket with the Hamilton function 

'^-■^[W.dQ^'dQ^dpJ ^^^^ 

in canonical coordinates tt = {Q^ , Pj}; whereas in the quantum case it takes the 
commutator with the Hamilton operator H, 

iC^{i/h)[H,*\ . (100) 

An observable A for which iCA -\- dA/dt = is called a constant of the motion; 
a state p for which Cp = is called stationary. Only for a stationary p the 
Liouvillian is Hermitian with respect to the canonical correlation function, 

{A;CB)p={CA;B)p \/ A,B . (101) 

The evolver U is defined as the solution of the differential equation 

^U{to,t) = tU{to,t)C (102) 

with initial condition U{to,to) = 1. As long as the Liouvillian £ is not explicitly 
time-dependent, the solution has the simple exponential form 

U(to, t) = exp[i(i - to)jC] ; (103) 

however, we shall not assume this in the following. The evolver determines -at 
least formally- the evolution of expectation values via 

{A),{t) = {U{to,t)A),{to) . (104) 

Multiplication with a step function 

nt-t.)^[l:lfll (105) 

yields the so-called causal evolver 

U^{to,t):^U{to,t)-9{t-to) (106) 
(where '<' symbolizes Hq < f) which satisfies another differential equation 

^U4to,t)^iU4to,t)C + 5{t-to) . (107) 

If a (possibly time-dependent) perturbation is added to the Liouvillian, 

£W:=£ + V , (108) 
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then the perturbed causal evolver W< is related to the unperturbed W< by an 
integral equation 

^(to,t) =W<(to,t) + r dt'uS'\to,t')iV{t')U<it',t) . (109) 

J —oo 

Iteration of this integral equation -re-expressing the U^\to,t') in the integrand 
in terms of another sum of the form ( |109| ), and so on- yields an infinite series, the 
terms being of increasing order in V. Truncating this series after the term of order 
V" gives an approximation to the exact causal evolver in n-th order perturbation 
theory. 



3.2 Kubo formula 

The Kubo formula describes the response of a system to weak time-dependent 
external fields 0"(t). Before t = the external fields are zero and the system is 
assumed to be in an initial equilibrium state 

p(0) = iexp(^-5:A"G40]^ (110) 

characterized by some set {(ja[0]} of constants of the motion at zero field (and 
with the a priori distribution a taken to be uniform). Then the external fields 
are switched on: 

How does an arbitrary expectation value (A) (t) evolve in response to this external 
perturbation? The general solution is 

{A){t) = {u]t\o,t)A)o , (112) 

where ()o stands for the expectation value in the initial equilibrium state p(0). We 
assume that the observable A does not depend explicitly on time or on the fields 
(/)"(t). The Hamiltonian H[(j)] and with it the Liouvillian C[(j)], on the other hand, 
generally do depend on the external fields. Provided the fields are sufficiently 
weak, the Liouvillian may be expanded linearly: 

0"(t) . (113) 

0=0 

The zero-field Liouvillian C[0] is assumed to be not explicitly time- dependent; 
the linear correction to it generally is, and may be regarded as a time-dependent 
perturbation V{t). Application of first order time-dependent perturbation the- 
ory then yields the evolver in terms of V{t) and the zero-field evolver Z//<. 



dm 
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Assuming for simplicity that (^4)0 = we thus find 



<f>=0 1 



With the help of the mathematical identity (prove it!) 



(114) 



(115) 



we can also write 



Ga[%U<{t',t)A) r{t') . (116) 
=0 / 



In general, the constants of the motion depend explicitly on the external fields. 
They satisfy 

C[4>]GM^^ V0 , (117) 

yet generally 7^ for 0' 7^ 0. Together with the Leibniz rule this 

implies 



dm 



(118) 



which we use to obtain 



^ J —00 „ \ 



■M<{t\t)A) r{t') . (119) 

<t>=o 1 



The right-hand side of this equation has the structure of a convolution, so in the 
frequency representation we obtain an ordinary product 



(A)(a;) = EXaHr(c^) • 



The coefficient 



Xa (^) = - E /f dt exp(za;i) (iC[Q] 



dGA 



90" 



; Ait) ) 

</.=o / 



(120) 



(121) 



with A{t) := V{^{0,t)A is called the dynamical susceptibility. The above expres- 
sion for the dynamical susceptibility is known as the Kubo formula. 

3.3 Example: Electrical conductivity 

The conductivity (T*'^(u;) determines the linear response of the current density j to 
a (possibly time-dependent) homogeneous external electric field E. We identify 



r^E, , A^f , x»-<^^n^) ■ 



(122) 
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Since a conductor is an open system with the number of electrons fixed only 
on average, its initial state must be described by a grand canonical ensemble: 
{GaM} {H[E], N}, with associated Lagrange parameters {A"} — > — 
In principle, the formula for the conductivity then contains both dH/dEi and 
dN/dEi] but the latter vanishes, and there remains only 

5^ = -eQ' . (123) 

with denoting the i-th component of the position observable and e the electron 
charge. We use the general formula (|121| ) for the susceptibility to obtain 

a'^{u) = e(3 dt exp(iut){im]Q';j\t))o . (124) 
Jo 

The current density is related to the velocity V'^ by 

f = enV'' , (125) 

where n is the number density of electrons. Furthermore, iC[0]Q'^ = V^. Hence 
the conductivity is proportional to the velocity-velocity correlation: 

a'\uj) = e^np dt exp{tujt){V';V\t))o ■ (126) 





This result is rather intuitive. In a dirty metal or semiconductor, for instance, the 
electrons will often scatter off impurities, thereby changing their velocities. As 
a result, the velocity- velocity correlation function will decay rapidly, leading to 
a small conductivity. In a clean metal with fewer impurities, on the other hand, 
the velocity-velocity correlation function will decay more slowly, giving rise to a 
correspondingly larger conductivity. 
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